An Efficient Method for Discretizing Continuous Attributes
نویسندگان
چکیده
In this paper the authors present a novel method for finding optimal split points for discretization of continuous attributes. Such a method can be used in many data mining techniques for large databases. The method consists of two major steps. In the first step search space is pruned using a bisecting region method that partitions the search space and returns the point with the highest information gain based on its search. The second step consists of a hill climbing algorithm that starts with the point returned by the first step and greedily searches for an optimal point. The methods were tested using fifteen attributes from two data sets. The results show that the method reduces the number of searches drastically while identifying the optimal or near-optimal split points. On average, there was a 98% reduction in the number of information gain calculations with only 4% reduction in information gain.
منابع مشابه
Discretizing Continuous Attributes Using Information Theory
Many classification algorithms require that training examples contain only discrete values. In order to use these algorithms when some attributes have continuous numeric values, the numeric attributes must be converted into discrete ones. This paper describes a new way of discretizing numeric values using information theory. The amount of information each interval gives to the target attribute ...
متن کاملDiscretizing Continuous Attributes in AdaBoost for Text Categorization
We focus on two recently proposed algorithms in the family of “boosting”-based learners for automated text classification, AdaBoost.MH and AdaBoost.MH. While the former is a realization of the well-known AdaBoost algorithm specifically aimed at multi-label text categorization, the latter is a generalization of the former based on the idea of learning a committee of classifier sub-committees. Bo...
متن کاملDiscretizing Continuous Attributes While Learning Bayesian Networks
We introduce a method for learning Bayesian networks that handles the discretization of continuous variables as an integral part of the learning process. The main ingredient in this method is a new metric based on the Minimal Description Length principle for choosing the threshold values for the discretization while learning the Bayesian network structure. This score balances the complexity of ...
متن کاملA discretizing method for reliability computation in complex stress-strength models
This paper proposes, implements and evaluates an original discretization method for continuous random variables, in order to estimate the reliability of systems for which stress and strength are defined as complex functions, and whose reliability is not derivable through analytic techniques. This method is compared to other two discretizing approaches appeared in literature, also through a comp...
متن کاملOFP_CLASS: a hybrid method to generate optimized fuzzy partitions for classification
The discretization of values plays a critical role in data mining and knowledge discovery. The representation of information through intervals is more concise and easier to understand at certain levels of knowledge than the representation by mean continuous values. In this paper, we propose a method for discretizing continuous attributes by means of a series of fuzzy sets, which constitutes a f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJDWM
دوره 6 شماره
صفحات -
تاریخ انتشار 2010